Twitter-Network Topic Model: A Full Bayesian Treatment for Social Network and Text Modeling

نویسندگان

  • Kar Wai Lim
  • Changyou Chen
  • Wray L. Buntine
چکیده

Twitter data is extremely noisy – each tweet is short, unstructured and with informal language, a challenge for current topic modeling. On the other hand, tweets are accompanied by extra information such as authorship, hashtags and the user-follower network. Exploiting this additional information, we propose the Twitter-Network (TN) topic model to jointly model the text and the social network in a full Bayesian nonparametric way. The TN topic model employs the hierarchical Poisson-Dirichlet processes (PDP) for text modeling and a Gaussian process random function model for social network modeling. We show that the TN topic model significantly outperforms several existing nonparametric models due to its flexibility. Moreover, the TN topic model enables additional informative inference such as authors’ interests, hashtag analysis, as well as leading to further applications such as author recommendation, automatic topic labeling and hashtag suggestion. Note our general inference framework can readily be applied to other topic models with embedded PDP nodes. ∗ Part of this work is now published in the journal article: Lim, K. W., Buntine, W. L., Chen, C., and Du, L. (2016). Nonparametric Bayesian topic modelling with the hierarchical Pitman-Yor processes. International Journal of Approximate Reasoning, 78:172191. [Lim et al., 2016]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Green Supply Chain Risk Network Management and Performance Analysis: Bayesian Belief Network Modeling

With the increase in environmental awareness, competitions and government policies, implementation of green supply chain management activities to sustain production and conserve resources is becoming more necessary for different organizations. However, it is difficult to successfully implement green supply chain (GSC) activities because of the risks involved. These risks alongside their resourc...

متن کامل

A Model for Tax Evasion Forcasting based on ID3 Algorithm and Bayesian Network

Nowadays, knowledge is a valuable and strategic source as well as an asset for evaluation and forecasting. Presenting these strategies in discovering corporate tax evasion has become an important topic today and various solutions have been proposed. In the past, various approaches to identify tax evasion and the like have been presented, but these methods have not been very accurate and the ove...

متن کامل

Celebrity Recommendation with Collaborative Social Topic Regression

Recently how to recommend celebrities to the public becomes an interesting problem on the social network websites, such as Twitter and Tencent Weibo. In this paper, we proposed a unified hierarchical Bayesian model to recommend celebrities to the general users. Specifically, we proposed to leverage both social network and descriptions of celebrities to improve the prediction ability and recomme...

متن کامل

The modeling of body's immune system using Bayesian Networks

In this paper, the urinary infection, that is a common symptom of the decline of the immune system, is discussed based on the well-known algorithms in machine learning, such as Bayesian networks in both Markov and tree structures. A large scale sampling has been executed to evaluate the performance of Bayesian network algorithm. A number of 4052 samples wereobtained from the database of the Tak...

متن کامل

Design and Test of the Real-time Text mining dashboard for Twitter

One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1609.06791  شماره 

صفحات  -

تاریخ انتشار 2013